November 29, 2022 - North Sea Group

(1) Background: Book on Legal Probabilism

(2) Three Probabilisms

A rational agent’s credal state can be modeled by:

  • a single probability measure (Precise Probabilism)

  • a set of probability measures (Imprecise probabilism)

  • a distribution over probability measures (Higher-order probabilism)

Troubles for Precise and Imprecise Probabilism

Fair Coin v. Unknown Bias

Coin is known to be fair v. The bias of the coin is unknown

  • Precise probabilism cannot distinguish between Fair coin and Unknown bias

  • Imprecise probabilism can, but…

Two Biases v. Unbalanced Biases

The bias of the coin is either 4. or .6 v. Bias .4 is three timesmore likely than bias .6

  • Imprecise Probabilism cannot distinguish between Two Biases and Unbalanced Biases

Higher-order Probabilism Fares Better

(3) Accuracy and Higher-order Probabilism

  • Besides its ability to model scenarios, is there a normative reason to prefer higher-order probabilism?

  • Stalemate in the Sjerps-Taroni debate

Claim

Probability Mass Functions (PMFs) based on point-estimaes are systematically less accurate than Posterior Predictive Distributions based on distribution-estimates

Accuracy is quantified by Kullback-Leibler divergence which measures the closeness of a distribution (say, the distribution whose accuracy we want to quantify) to another (say, the true distribution)

Actual PMF v. PMF Based on Point Estimate v. Posterior Predictive Distribution

Distance from Actual Probability Mass Function

(4) Higher-order Bayesian Networks

(4) Higher-order Bayesian Networks (continued)

(5) Weight of Evidence

Precursors

  • Beans from a bag, two colors, same observed proportion, different sample sizes (Peirce 1872)

  • Balance v. weight of evidence (Keynes 1921)

Desiderata

  • Weak Increase: In Bernoulli trials, weight increases with sample size keeping frequency fixed

  • No Monotonicity: Weight need not always increase as more evidence is accumulated

Existing accounts

Precise and imprecise probabilism do not offer a satisfactory account of weight of evidence

Information-theoretic Weight

Key notions:

  • Surprise: \(1/\mathsf{P}(x)\)

  • Shannon information: \(log_2(\mathsf{surprise}) = - log_2(\mathsf{P}(x))\)

  • Entropy is average Shannon information:

\[H(X) = - \sum \mathsf{P}(x_i) \log_2 \mathsf{P}(x_i)\]

(the expected amount of information you receive once you learn what the value of \(X\) is).

  • Weight of a distribution:

\[\mathsf{w(P)} = 1 - \left( \frac{H(\mathsf{P})}{H(\mathsf{uniform})}\right)\]

(the more informative a distribution, compared to the uniform, the more weight it has, on scale 0 to 1)

Weight of a Distribution: Examples

Weak Increase Holds but Monotonicity Fails

(5) Resilience and Completeness

  • Defendants may argue on appeal that, had other evidence be considered, the verdict would have changed (violation of resilience)

  • Litigation routinely considers questions of missing evidence and what remedies should be granted to the litigants when evidence is missing (violation of completeness)

Open questions

  • How does the information-theoretic account of weight compares to notions such as resilience and completeness which—arguably—can be modeled by precise probabilism?

  • Does an adequate evaluation of the evidence in the trial context only require three notions—probability, resilience and completeness—making weight of evidence redundant?

Thank you!

EXTRA: Good on Weight of Evidence

  • \(W(H:E)\) is some function of \(\mathsf{P}(E\vert H), \mathsf{P}(E\vert \neg H)\)

  • \(\mathsf{P}(H \vert E) = g[W(H:e), \mathsf{P}(H)]\)

  • \(W(H: E_1 \wedge E_2) = W(H:E_1) + W(H:E_2 \vert E_1)\)

\[W(H:E) = \log \frac{\mathsf{P}(E \vert H)}{\mathsf{P}(E\vert \neg H)}\]

Good’s weight is not what we’re after

Good’s own example (expanded)

  • a die is selected at random from nine fair dice and one with bias \(\frac{1}{3}\).

  • Uniform prior gives you weight of evidence for the loaded die \(log_{10}(.1)\), that is -1 (-10 db).

  • Every time you toss it and obtain a six, you gain \(log_{10}(\frac{\frac{1}{3}}{\frac{1}{6}})= log_{10}(2)\)

  • Every time you toss it and obtain something else, the weight changes by \(log_{10}(\frac{\frac{2}{3}}{\frac{5}{6}})= log_{10}(.8)\).

Good’s Weight Fails at Weak Increase

## Warning: Using the `size` aesthetic in this geom was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` in the `default_aes` field and elsewhere instead.

EXTRA: Joyce on Weight of Evidence

Joyce: \(w(X,E) = \sum_x \vert c(ch(X) = x \vert E) \times (x - c(X\vert E))^2 - c(ch(X) = x) \times (x - c(X))^2\vert\)

hypotheses .4 .5 .6
credences 1/3 1/3 1/3
\(c(X) = \sum_x c(Ch(X)=x)x\) .5 .5 .5
\(c(E \vert ch(X) =x)\) .042 .117 .214
\(c(E) = \sum_x c(E \vert ch(X) =x) c(ch(X)=x)\) .124 .124 .124
\(c(ch(X)=x \vert E)\) .113 .312 .573
\(c(X|E) = \sum_x c(Ch(X)=x\vert E)x\) .54 .54 .54
prior weights 0.01 0 .01
posterior weights .021 .002 .002
w .0066 .0066 .0066